Finite-state Relations Between Two Historically Closely Related Languages

نویسنده

  • Kimmo Koskenniemi
چکیده

Regular correspondences between historically related languages can be modelled using finitestate transducers (FST). A new method is presented by demonstrating it with a bidirectional experiment between Finnish and Estonian. An artificial representation (resembling a protolanguage) is established between two related languages. This representation, AFE (Aligned Finnish-Estonian) is based on the letter by letter alignment of the two languages and uses mechanically constructed morphophonemes which represent the corresponding characters. By describing the constraints of this AFE using two-level rules, one may construct useful mappings between the languages. In this way, the badly ambiguous FSTs from Finnish and Estonian to AFE can be composed into a practically unambiguous transducer from Finnish to Estonian. The inverse mapping from Estonian to Finnish is mildly ambiguous. Steps according to the proposed method could be repeated as such with dialectal or older written texts. Choosing a set of model words, aligning them, recording the mechanical correspondences and designing rules for the constraints could be done with a limited effort. For the purposes of indexing and searching, the mild ambiguity may be tolerable as such. The ambiguity can be further reduced by composing the resulting FST with a speller or morphological analyser of the standard language.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysis of Language Legislation of All 85 Russian Federation’s Subjects (Regions)

The analysis of the language legislation of all 85 subjects of the Russian Federation shows complete heterogeneity and diversity. Common legal guidelines in Federal law do not exist, because Federal legislation is obsolete and is largely whitespace and conflict. The subjects of the Russian Federation, on whose territory different ethnic groups, both large and indigenous, historically live, solv...

متن کامل

A Machine Translation System Between a Pair of Closely Related Languages

Machine translation between closely related languages is easier than between language pairs that are not related with each other. Having many parts of their grammars and vocabularies in common reduces the amount of effort needed to develop a translation system between related languages. A translation system that makes a morphological analysis supported by simpler translation rules and context d...

متن کامل

On the Graphs Related to Green Relations of Finite Semigroups

In this paper we develop an analog of the notion of the con- jugacy graph of  nite groups for the  nite semigroups by considering the Green relations of a  nite semigroup. More precisely, by de ning the new graphs $Gamma_{L}(S)$, $Gamma_{H}(S)$, $Gamma_{J}(S)$ and $Gamma_{D}(S)$ (we name them the Green graphs) related to the Green relations L R J H and D of a  nite semigroup S , we  first atte...

متن کامل

Internship report - Streaming String Transducers

In formal language theory, two very different models sometimes turn out to describe the same class of languages. This usually shows that there is a fundamental concept described by those models. A well-known example is the class of regular languages, which can be characterized by logic (monadic second order (MSO) logic), algebra (syntactic monoids), and many computational models (automata). In ...

متن کامل

Using Mazurkiewicz Trace Languages for Partition-Based Morphology

Partition-based morphology is an approach of finite-state morphology where a grammar describes a special kind of regular relations, which split all the strings of a given tuple into the same number of substrings. They are compiled in finite-state machines. In this paper, we address the question of merging grammars using different partitionings into a single finite-state machine. A morphological...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013